Language Model for Cyrillic Mongolian to Traditional Mongolian Conversion
نویسندگان
چکیده
Traditional Mongolian and Cyrillic Mongolian are both Mongolian languages that are respectively used in china and Mongolia. With similar oral pronunciation, their writing forms are totally different. A large part of Cyrillic Mongolian words have more than one corresponds in Traditional Mongolian. This makes the conversion from Cyrillic Mongolian to Traditional Mongolian a hard problem. To overcome this difficulty, this paper proposed a Language model based approach, which takes the advantage of context information. Experimental results show that, for Cyrillic Mongolian words that have multiple correspondence in Traditional Mongolian, the correct rate of this approach reaches 87.66%, thereby greatly improve the overall system performance.
منابع مشابه
Retrieval in Texts with Traditional Mongolian Script Realizing Unicoded Traditional Mongolian Digital Library
This paper discusses our approaches to create a digital library on traditional Mongolian script using Unicode. Also we introduce system architecture of a digital library that stores books and materials of historical importance written in traditional Mongolian which contain history of 1,000 years and are important part of Mongolian culture. Specifically, we propose a technique that will allow us...
متن کاملCyrillic Mongolian Named Entity Recognition with Rich Features
In this paper, we first create a Cyrillic Mongolian named entity manually annotated corpus. The annotation types contain person names, location names, organization names and other proper names. Then, we use Condition Random Field as classifier and design few categories features of Mongolian, including orthographic feature, morphological feature, gazetteer feature, syllable feature, word cluster...
متن کاملExtracting Loanwords from Mongolian Corpora and Producing a Japanese-Mongolian Bilingual Dictionary
This paper proposes methods for extracting loanwords from Cyrillic Mongolian corpora and producing a Japanese–Mongolian bilingual dictionary. We extract loanwords from Mongolian corpora using our own handcrafted rules. To complement the rule-based extraction, we also extract words in Mongolian corpora that are phonetically similar to Japanese Katakana words as loanwords. In addition, we corresp...
متن کاملA Study of Traditional Mongolian Script Encodings and Rendering: Use of Unicode in OpenType fonts
This article discusses the rendering issues of complex text layouts, particularly traditional Mongolian script. Some standards such as Unicode and OpenType format have been implemented and are supported widely. Traditional Mongolian script has been standardized in Unicode. We analyzed existing OpenType fonts and their rendering schemes for traditional Mongolian script. We found some errors, and...
متن کاملA Novel Approach to Improve the Mongolian Language Model Using Intermediate Characters
In Mongolian language, there is a phenomenon that many words have the same presentation form but represent different words with different codes. Since typists usually input the words according to their representation forms and cannot distinguish the codes sometimes, there are lots of coding errors occurred in Mongolian corpus. It results in statistic and retrieval very difficult on such a Mongo...
متن کامل